Spectral Clustering of Mixed-Type Data

نویسندگان

چکیده

Cluster analysis seeks to assign objects with similar characteristics into groups called clusters so that within a group are each other and dissimilar in groups. Spectral clustering has been shown perform well different scenarios on continuous data: it can detect convex non-convex clusters, overlapping clusters. However, the constraint data be limiting real applications where often of mixed-type, i.e., contains both categorical features. This paper looks at extending spectral mixed-type data. The new method replaces Euclidean-based similarity distance used conventional dissimilarity measures for variables. A global measure is than computed using weighted sum, Gaussian kernel convert matrix matrix. includes an automatic tuning variable weight parameter. performance compared two state-of-the-art methods, k-prototypes KAMILA, several simulated sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrative Parameter-Free Clustering of Data with Mixed Type Attributes

Integrative mining of heterogeneous data is one of the major challenges for data mining in the next decade. We address the problem of integrative clustering of data with mixed type attributes. Most existing solutions suffer from one or both of the following drawbacks: Either they require input parameters which are difficult to estimate, or/and they do not adequately support mixed type attribute...

متن کامل

Clustering of samples and variables with mixed-type data

Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then vi...

متن کامل

CluMix: Clustering and Visualization of Mixed-Type Data

In real data situations various factors of interest are measured on different scales, e.g. quantitative gene expression values and categorical clinical features like gender, disease stage etc. In many cases (pre-selected) gene expression data are visualized in heatmaps, while further patient characteristics are only added ”informatively” on top. This can be visually quite confusing in case ther...

متن کامل

Spectral Clustering of Data Streams

The data is modeled as an m × n matrix P n points, each m-dimensional that is arriving as a stream, a point at a time. Our algorithm has two components stream framework StrFr and spectral clustering algorithm SpCl. SpCl works on an adjacency matrix representation of the points by recursively producing cuts on the graph thus defined. StrFr takes in the stream in chunks of size n1, converts a chu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Stats

سال: 2021

ISSN: ['2571-905X']

DOI: https://doi.org/10.3390/stats5010001